[wip] deepseek r1 distilled llama 70B #1050

pawalt · 2025-01-20T17:37:01Z

Keeping this as a WIP since we probably want to keep the Llama example. Only changes are:

vLLM version
Number of GPUs (full precision 70B requires 2xH100)
Chat format argument required in new vLLM version

charlesfrye · 2025-01-20T17:42:36Z

🚀 The docs preview is ready! Check it out here: https://modal-labs-examples--frontend-preview-dcedce0.modal.run

luiscape · 2025-01-21T14:43:52Z

@pawalt looking forward to this one. I have a feeling that fine-tuning R1 would be great for us given its inference ROI.

charlesfrye · 2025-01-21T16:49:27Z

nice! I think we'll keep the smaller model in the main example, since the download time and cold boots for a 70B model are pretty painful.

salman1993 · 2025-01-26T16:20:02Z

@pawalt have you looked into enabling tool calling by any chance?
https://github.com/vllm-project/vllm/blob/6d0e3d372446cde48b387d4d3530e25fc6e06320/vllm/entrypoints/openai/serving_chat.py#L47

i get back responses from the API when i add in enable_auto_tools and tool_parser params, but tool calling doesn't work that well. my guess is the granite parser doesn't work well for deepseek models - we'd need the tool template it was trained with. also tried llama3_json parser but that doesn't work either

api_server.chat = lambda s: OpenAIServingChat(
        engine,
        ...
        chat_template_content_format="raw",
        enable_auto_tools=True,
        tool_parser="granite",
    )

example response i get (should have used get_weather tool call):

{
  "id": "chatcmpl-52b3d25b480144ff9dfecebf8e739d0c",
  "object": "chat.completion",
  "created": 1737908244,
  "model": "DeepSeek-R1-Distill-Llama-70B",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "<think>\n\n</think>\n\nHi there! I suggest getting online to get real-time information. If you have any other questions, please don't hesitate to let me know!",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "stop",
      "stop_reason": null
    }
  ],
  ...
}

charlesfrye · 2025-01-28T22:57:01Z

closing since we now have a DeepSeek-R1 example (cf #1057)

[wip] deepseek r1 distilled llama 70B

f07018b

charlesfrye closed this Jan 28, 2025

github-actions bot deleted the pawalt/deepseek_r1 branch March 1, 2025 15:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wip] deepseek r1 distilled llama 70B #1050

[wip] deepseek r1 distilled llama 70B #1050

pawalt commented Jan 20, 2025

charlesfrye commented Jan 20, 2025

luiscape commented Jan 21, 2025

charlesfrye commented Jan 21, 2025

salman1993 commented Jan 26, 2025 •

edited

Loading

charlesfrye commented Jan 28, 2025

[wip] deepseek r1 distilled llama 70B #1050

[wip] deepseek r1 distilled llama 70B #1050

Conversation

pawalt commented Jan 20, 2025

charlesfrye commented Jan 20, 2025

luiscape commented Jan 21, 2025

charlesfrye commented Jan 21, 2025

salman1993 commented Jan 26, 2025 • edited Loading

charlesfrye commented Jan 28, 2025

salman1993 commented Jan 26, 2025 •

edited

Loading